-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-54634][SQL] Add clear error message for empty IN predicate #53390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-54634][SQL] Add clear error message for empty IN predicate #53390
Conversation
allisonwang-db
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making the error message better!
| exception = parseException(sql2), | ||
| condition = "PARSE_SYNTAX_ERROR", | ||
| parameters = Map("error" -> "'IN'", "hint" -> "")) | ||
| parameters = Map("error" -> "'INTO'", "hint" -> "")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the error message before and after this change for this test case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Allison,
This is the before and after this change for this test case:
Before:
[scala> spark.sql("SELECT * FROM S WHERE C1 IN (INSERT INTO T VALUES (2))").show()
org.apache.spark.sql.catalyst.parser.ParseException:
[PARSE_SYNTAX_ERROR] Syntax error at or near 'IN'. SQLSTATE: 42601 (line 1, pos 25)
== SQL ==
SELECT * FROM S WHERE C1 IN (INSERT INTO T VALUES (2))
-------------------------^^^
at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(parsers.scala:285)
at org.apache.spark.sql.catalyst.parser.AbstractParser.parse(parsers.scala:97)
at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:54)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(AbstractSqlParser.scala:93)
at org.apache.spark.sql.classic.SparkSession.$anonfun$sql$5(SparkSession.scala:492)
at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:148)
at org.apache.spark.sql.classic.SparkSession.$anonfun$sql$4(SparkSession.scala:491)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:804)
at org.apache.spark.sql.classic.SparkSession.sql(SparkSession.scala:490)
at org.apache.spark.sql.classic.SparkSession.sql(SparkSession.scala:504)
at org.apache.spark.sql.classic.SparkSession.sql(SparkSession.scala:513)
at org.apache.spark.sql.classic.SparkSession.sql(SparkSession.scala:91)
... 42 elided
After:
[scala> spark.sql("SELECT * FROM S WHERE C1 IN (INSERT INTO T VALUES (2))").show()
org.apache.spark.sql.catalyst.parser.ParseException:
[PARSE_SYNTAX_ERROR] Syntax error at or near 'INTO'. SQLSTATE: 42601 (line 1, pos 36)
== SQL ==
SELECT * FROM S WHERE C1 IN (INSERT INTO T VALUES (2))
------------------------------------^^^
at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(parsers.scala:267)
at org.apache.spark.sql.catalyst.parser.AbstractParser.parse(parsers.scala:78)
at org.apache.spark.sql.execution.SparkSqlParser.super$parse(SparkSqlParser.scala:163)
at org.apache.spark.sql.execution.SparkSqlParser.$anonfun$parseInternal$1(SparkSqlParser.scala:163)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:107)
at org.apache.spark.sql.execution.SparkSqlParser.parseInternal(SparkSqlParser.scala:163)
at org.apache.spark.sql.execution.SparkSqlParser.parseWithParameters(SparkSqlParser.scala:70)
at org.apache.spark.sql.execution.SparkSqlParser.parsePlanWithParameters(SparkSqlParser.scala:84)
at org.apache.spark.sql.classic.SparkSession.$anonfun$sql$6(SparkSession.scala:573)
at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:148)
at org.apache.spark.sql.classic.SparkSession.$anonfun$sql$4(SparkSession.scala:572)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:804)
at org.apache.spark.sql.classic.SparkSession.sql(SparkSession.scala:563)
at org.apache.spark.sql.classic.SparkSession.sql(SparkSession.scala:591)
at org.apache.spark.sql.classic.SparkSession.sql(SparkSession.scala:682)
at org.apache.spark.sql.classic.SparkSession.sql(SparkSession.scala:92)
... 42 elided
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @allisonwang-db , could you check this output and let me know, thanks!
allisonwang-db
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix. Much better error message.
|
Thanks for approving the changes, @allisonwang-db. Do you happen to know when this PR might be merged? |
|
cc @cloud-fan |
| errorClass = "INVALID_SQL_SYNTAX.EMPTY_IN_PREDICATE", | ||
| messageParameters = Map( | ||
| "alternative" -> ("Consider using 'WHERE FALSE' if you need an always-false condition, " + | ||
| "or provide at least one value in the IN list.")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why pass the alternative as an error parameter, instead of just put it in the error message template?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking back, it's quite possible to directly include this alternative in the error message template. Shall I make this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, please check.
|
thanks, merging to master! |
What changes were proposed in this pull request?
This PR will address the issue SPARK-54634.
With this, I am adding a user-friendly error message when users write SQL queries with an empty IN clause, like: SELECT * FROM table WHERE col IN ()
Why are the changes needed?
When users write SQL with an empty IN clause, Spark currently produces a syntax error of subclass [PARSE_SYNTAX_ERROR], which leads the user to believe that their syntax is incorrect, whereas the actual issue is due to the absence of values for the IN clause. Hence, the current error message does not communicate the right intention to the user.
This change provides a clear, actionable error message that explains the actual problem
and suggests alternatives.
Example - Before:
Example - After:
Does this PR introduce any user-facing change?
Yes, users will now see a better error message.
Code executed:
spark.sql("SELECT * FROM range(10) WHERE id IN ()").show()Before output:

After output:

How was this patch tested?
Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude (Anthropic) - used for code assistance, test generation, and documentation.